Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference ; : 2644-2656, 2023.
Article in English | Scopus | ID: covidwho-20243588

ABSTRACT

In automated scientific fact-checking, machine learning models are trained to verify scientific claims given evidence. A major bottleneck of this task is the availability of large-scale training datasets on different domains, due to the required domain expertise for data annotation. However, multiple-choice question-answering datasets are readily available across many different domains, thanks to the modern online education and assessment systems. As one of the first steps towards addressing the fact-checking dataset scarcity problem in scientific domains, we propose a pipeline for automatically converting multiple-choice questions into fact-checking data, which we call Multi2Claim. By applying the proposed pipeline, we generated two large-scale datasets for scientific-fact-checking: Med-Fact and Gsci-Fact for the medical and general science domains, respectively. These two datasets are among the first examples of large-scale scientific-fact-checking datasets. We developed baseline models for the verdict prediction task using each dataset. Additionally, we demonstrated that the datasets could be used to improve performance measured by weighted F1 on existing fact-checking datasets such as SciFact, HEALTHVER, COVID-Fact, and CLIMATE-FEVER. In some cases, the improvement in performance was up to a 26% increase. The generated datasets are publicly available. © 2023 Association for Computational Linguistics.

2.
5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023 ; : 444-447, 2023.
Article in English | Scopus | ID: covidwho-2306891

ABSTRACT

Sentiment analysis has a critical role to reveal an opinion in a text-based form. Therefore, we exploit this analysis to discover the sentiment polarity of Taiwan Social Distancing mobile application. This paper proposes a semi-supervised scheme for annotating this mobile application's reviews. The semi-supervised scheme utilized a combination of numeric rating and lexicon-based sentiment. In addition, we also perform the sentiment analysis on an aspect-based level. Based on the experiment, we decide to select three aspects to be analyzed. This paper also evaluates the proposed scheme by implementing bidirectional encoder representations from transformers (BERT) and multilayer perceptron (MLP) as the classification model using the sentiment label of the proposed scheme. The result shows that the annotation of the proposed scheme outperforms the data annotation using counterpart models. © 2023 IEEE.

3.
60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; 1:2736-2749, 2022.
Article in English | Scopus | ID: covidwho-2274256

ABSTRACT

News events are often associated with quantities (e.g., the number of COVID-19 patients or the number of arrests in a protest), and it is often important to extract their type, time, and location from unstructured text in order to analyze these quantity events. This paper thus formulates the NLP problem of spatiotemporal quantity extraction, and proposes the first meta-framework for solving it. This meta-framework contains a formalism that decomposes the problem into several information extraction tasks, a shareable crowdsourcing pipeline, and transformer-based baseline models. We demonstrate the meta-framework in three domains-the COVID-19 pandemic, Black Lives Matter protests, and 2020 California wildfires-to show that the formalism is general and extensible, the crowdsourcing pipeline facilitates fast and high-quality data annotation, and the baseline system can handle spatiotemporal quantity extraction well enough to be practically useful. We release all resources for future research on this topic. © 2022 Association for Computational Linguistics.

4.
PeerJ Comput Sci ; 8: e1151, 2022.
Article in English | MEDLINE | ID: covidwho-2155752

ABSTRACT

Since the inception of the current COVID-19 pandemic, related misleading information has spread at a remarkable rate on social media, leading to serious implications for individuals and societies. Although COVID-19 looks to be ending for most places after the sharp shock of Omicron, severe new variants can emerge and cause new waves, especially if the variants can evade the insufficient immunity provided by prior infection and incomplete vaccination. Fighting the fake news that promotes vaccine hesitancy, for instance, is crucial for the success of the global vaccination programs and thus achieving herd immunity. To combat the proliferation of COVID-19-related misinformation, considerable research efforts have been and are still being dedicated to building and sharing COVID-19 misinformation detection datasets and models for Arabic and other languages. However, most of these datasets provide binary (true/false) misinformation classifications. Besides, the few studies that support multi-class misinformation classification deal with a small set of misinformation classes or mix them with situational information classes. False news stories about COVID-19 are not equal; some tend to have more sinister effects than others (e.g., fake cures and false vaccine info). This suggests that identifying the sub-type of misinformation is critical for choosing the suitable action based on their level of seriousness, ranging from assigning warning labels to the susceptible post to removing the misleading post instantly. We develop comprehensive annotation guidelines in this work that define 19 fine-grained misinformation classes. Then, we release the first Arabic COVID-19-related misinformation dataset comprising about 6.7K tweets with multi-class and multi-label misinformation annotations. In addition, we release a version of the dataset to be the first Twitter Arabic dataset annotated exclusively with six different situational information classes. Identifying situational information (e.g., caution, help-seeking) helps authorities or individuals understand the situation during emergencies. To confirm the validity of the collected data, we define three classification tasks and experiment with various machine learning and transformer-based classifiers to offer baseline results for future research. The experimental results indicate the quality and validity of the data and its suitability for constructing misinformation and situational information classification models. The results also demonstrate the superiority of AraBERT-COV19, a transformer-based model pretrained on COVID-19-related tweets, with micro-averaged F-scores of 81.6% and 78.8% for the multi-class misinformation and situational information classification tasks, respectively. Label Powerset with linear SVC achieved the best performance among the presented methods for multi-label misinformation classification with micro-averaged F-scores of 76.69%.

5.
30th Italian Symposium on Advanced Database Systems, SEBD 2022 ; 3194:427-436, 2022.
Article in English | Scopus | ID: covidwho-2027121

ABSTRACT

Protein Contact Network (PCN) is an emerging paradigm for modelling protein structure. A common approach to interpreting such data is through network-based analyses. It has been shown that clustering analysis may discover allostery in PCN. Nevertheless Network Embedding has shown good performances in discovering hidden communities and structures in network. SARS-CoV-2 proteins, and in particular S protein, have a modular structure that need to be annotated to understand complex mechanism of infections. Such annotations, and in particular the highlighting of regions participating in the binding of human ACE2 and TMPRSS, may help the design of tailored strategy for preventing and blocking infection. In this work, we compare some approaches for graph embedding with respect to some classical clustering approaches for annotating protein structures. Results shows that embedding may reveal interesting structure that constitute the starting point for further analysis. © 2022 CEUR-WS. All rights reserved.

6.
4th International Conference on Intelligent Technologies and Applications, INTAP 2021 ; 1616 CCIS:287-299, 2022.
Article in English | Scopus | ID: covidwho-1971561

ABSTRACT

Social media has become popular among users for social interaction and news sources. Users spread misinformation in multiple data formats. However, systematic studying of social media phenomena has been challenging due to the lack of labelled data. This paper presents a semi-automated annotation framework AMUSED for gathering multilingual multimodal annotated data from social networking sites. The framework is designed to mitigate the workload in collecting and annotating social media data by cohesively combining machines and humans in the data collection process. AMUSED detects links to social media posts from a given list of news articles and then downloads the data from the respective social networking sites and labels them. The framework gathers the annotated data from multiple platforms like Twitter, YouTube, and Reddit. For the use case, we have implemented the framework for collecting COVID-19 misinformation data from different social media sites and have categorised 8,077 fact-checked articles into four different classes of misinformation. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

SELECTION OF CITATIONS
SEARCH DETAIL